Skip to content

Add some logging to GPU detection#6992

Open
davidpanderson wants to merge 3 commits into
masterfrom
dpa_gpu_detect3
Open

Add some logging to GPU detection#6992
davidpanderson wants to merge 3 commits into
masterfrom
dpa_gpu_detect3

Conversation

@davidpanderson

@davidpanderson davidpanderson commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

In particular, for a generic OpenCL device, log the calculation of its FLOPS.

Also: add timestamp to GPU detection log messages.

Also: for CUDA, use cuMemGetInfo() instead of cuMemGetInfo_v2().
On my computer, the latter always returns a 201 error
('not context', even though we just created a context).


Summary by cubic

Add timestamps to GPU detection logs and improve diagnostics for CUDA and OpenCL. Prefer CUDA cuMemGetInfo() with fallback and clearer logging to avoid context errors.

  • New Features

    • Timestamp all GPU detection warnings.
    • Log OpenCL generic device peak FLOPS and inputs (compute units, clock MHz).
  • Bug Fixes

    • Prefer cuMemGetInfo() and fall back to cuMemGetInfo_v2(); always log return code plus free/total memory, and warn if neither is available. Avoids error 201 ("no context") seen on some systems.

Written for commit b663abb. Summary will update on new commits.

even though we just successfully created a context.
cuMemGetInfo() doesn't get this error.  So use it.
Note: I'm not sure if GetInfo() works for > 4GB;
my GPU is a bit less than 4GB

OpenCL: for generic devices, write FLOPS info to log file
Copilot AI review requested due to automatic review settings April 13, 2026 00:22

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="client/gpu_nvidia.cpp">

<violation number="1" location="client/gpu_nvidia.cpp:601">
P1: This forced `false` disables the only safe fallback and can null-dereference `p_cuMemGetInfo` when only `cuMemGetInfo_v2` is exported.</violation>

<violation number="2" location="client/gpu_nvidia.cpp:610">
P2: `gpu_warning()` records a persistent warning, so this new success log will mark normal CUDA RAM probes as warnings.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread client/gpu_nvidia.cpp Outdated
Comment thread client/gpu_nvidia.cpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves GPU detection diagnostics by adding more detailed logging around OpenCL generic device FLOPS estimation, timestamping GPU detection stderr output, and adjusting CUDA memory queries.

Changes:

  • Log computed peak FLOPS for generic OpenCL devices (and warn/fallback when out of range).
  • Add timestamps to GPU detection messages written via gpu_warning().
  • Switch NVIDIA available-RAM query path toward cuMemGetInfo() (away from _v2) and add a detailed return-value log.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
client/gpu_opencl.cpp Adds detailed logging for generic OpenCL peak-FLOPS calculation and fallback behavior.
client/gpu_nvidia.cpp Alters CUDA memory info function selection and logs cuMemGetInfo return/free/total details.
client/gpu_detect.cpp Prefixes gpu_warning() stderr output with a timestamp.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread client/gpu_detect.cpp
Comment on lines 835 to 837
void gpu_warning(vector<string> &warnings, const char* msg) {
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "[%s] %s\n", time_to_string(dtime()), msg);
warnings.push_back(msg);

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_warning() now calls time_to_string(dtime()), but this file doesn't include str_util.h where time_to_string() is declared. This will fail to compile; include str_util.h (or otherwise ensure time_to_string is declared) in this translation unit.

Copilot uses AI. Check for mistakes.
Comment thread client/gpu_nvidia.cpp
Comment on lines +600 to +605
// cuMemGetInfo_v2() always returns 201 (no context)
if (p_cuMemGetInfo) {
retval = (*p_cuMemGetInfo)(&memfree, &memtotal);
}
if (retval) {
snprintf(buf, sizeof(buf),
"[coproc] cuMemGetInfo(%d) returned %d", cc.device_num, retval
"cuMemGetInfo() returned %d; dev %d free %zu total %zu",
retval, cc.device_num, memfree, memtotal

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if (false && p_cuMemGetInfo_v2) branch disables cuMemGetInfo_v2() unconditionally. If only cuMemGetInfo_v2 is available (p_cuMemGetInfo is null), the else-branch will dereference a null function pointer and crash. Prefer selecting the available symbol (use v2 when v1 is missing), or try v2 first and fall back to v1 on CUDA_ERROR_INVALID_CONTEXT (201).

Copilot uses AI. Check for mistakes.
Comment thread client/gpu_opencl.cpp
Comment on lines +749 to +756
snprintf(buf2, sizeof(buf2),
"OpenCL generic: peak FLOPS %f; Max units %u, max freq %u MHz",
prop.peak_flops,
prop.max_compute_units, prop.max_clock_frequency
);
gpu_warning(warnings, buf2);
if (prop.peak_flops <= 0 || prop.peak_flops > GPU_MAX_PEAK_FLOPS) {
char buf2[256];
snprintf(buf2, sizeof(buf2),
"OpenCL generic: bad peak FLOPS; Max units %u, max freq %u MHz",
prop.max_compute_units, prop.max_clock_frequency
);
gpu_warning(warnings, buf2);
gpu_warning(warnings, "bad peak flops value; using default");

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk introduces tab-indented lines (prop.peak_flops argument and the subsequent gpu_warning call), which are inconsistent with the surrounding space indentation and make the block harder to read/maintain. Replace the tabs with spaces to match the file’s existing indentation style.

Copilot uses AI. Check for mistakes.
Comment thread client/gpu_opencl.cpp
Comment on lines 755 to 757
if (prop.peak_flops <= 0 || prop.peak_flops > GPU_MAX_PEAK_FLOPS) {
char buf2[256];
snprintf(buf2, sizeof(buf2),
"OpenCL generic: bad peak FLOPS; Max units %u, max freq %u MHz",
prop.max_compute_units, prop.max_clock_frequency
);
gpu_warning(warnings, buf2);
gpu_warning(warnings, "bad peak flops value; using default");
prop.peak_flops = GPU_DEFAULT_PEAK_FLOPS;

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new warning message "bad peak flops value; using default" is ambiguous on its own (it doesn’t identify the device/vendor or the rejected value). Consider including at least the computed peak_flops (and/or vendor/name) so logs make sense when multiple devices are enumerated.

Copilot uses AI. Check for mistakes.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="client/gpu_nvidia.cpp">

<violation number="1" location="client/gpu_nvidia.cpp:602">
P1: Handle `cuMemGetInfo*()` failures before updating available RAM; otherwise an error now turns `available_ram` into 0 instead of preserving the fallback total memory value.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread client/gpu_nvidia.cpp
else {
// cuMemGetInfo_v2() always returns 201 (no context)
if (p_cuMemGetInfo) {
retval = (*p_cuMemGetInfo)(&memfree, &memtotal);

@cubic-dev-ai cubic-dev-ai Bot Apr 13, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Handle cuMemGetInfo*() failures before updating available RAM; otherwise an error now turns available_ram into 0 instead of preserving the fallback total memory value.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At client/gpu_nvidia.cpp, line 602:

<comment>Handle `cuMemGetInfo*()` failures before updating available RAM; otherwise an error now turns `available_ram` into 0 instead of preserving the fallback total memory value.</comment>

<file context>
@@ -598,19 +598,22 @@ static void get_available_nvidia_ram(COPROC_NVIDIA &cc, vector<string>& warnings
     // cuMemGetInfo_v2() always returns 201 (no context)
-    if (false && p_cuMemGetInfo_v2) {
+    if (p_cuMemGetInfo) {
+        retval = (*p_cuMemGetInfo)(&memfree, &memtotal);
+        snprintf(buf, sizeof(buf),
+            "cuMemGetInfo() returned %d; dev %d free %zu total %zu",
</file context>
Fix with Cubic

@AenBleidd

Copy link
Copy Markdown
Member

On my computer, the latter always returns a 201 error
('not context', even though we just created a context).

I have nVidia GPU, I'll try to debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants